Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC for ragna-base package with just base dependencies #405

Closed
wants to merge 28 commits into from

Conversation

arjxn-py
Copy link
Contributor

@arjxn-py arjxn-py commented May 7, 2024

This PR aims to give user the total control with pip install ragna i.e. which will correspond to current ragna[all] & also proposes a proof of concept for ragna-base which will give user the fine-grained control i.e. will further correspond to current ragna.
Related to #397


NOTE

  • This PR is still work in progress and potentially has much to be added.
  • Currently I'm trying to stick to these setuptools guidelines
  • Two packages like this can be achieved with two separate directories & pyproject.toml but that'll be a pain for maintenance, hence I'll try to achieve this in as minimal changes as possible
  • The related functions and methods will also be modified relatively once the core functionality is achieved.

@arjxn-py arjxn-py marked this pull request as draft May 7, 2024 02:16
@pmeier
Copy link
Member

pmeier commented May 7, 2024

Two packages like this can be achieved with two separate directories & pyproject.toml but that'll be a pain for maintenance, hence I'll try to achieve this in as minimal changes as possible

I don't think we can avoid that. Quoting from the setuptools documentation

Currently the following fields can be listed as dynamic: version, classifiers, description, entry-points, scripts, gui-scripts and readme.

Note that there is no mention of

ragna/pyproject.toml

Lines 8 to 9 in e397acb

[project]
name = "Ragna"

which we need to have Ragna and ragna-base.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 8, 2024

I don't think we can avoid that. Quoting from the setuptools documentation

I wonder if you're suggesting for a monorepo architecture, that is absolutely possible. So I guess we'll need to have two different directories i.e. ragna & ragna-base + add custom commands/workflows to install ragna/ragna-base, test both, and release both. But at this point I want to give this approach a little try first and keep monorepo as a final resort as I am positive that it'd solve the issue.

@pmeier
Copy link
Member

pmeier commented May 8, 2024

I wonder if you're suggesting for a monorepo architecture, that is absolutely possible. So I guess we'll need to have two different directories i.e. ragna & ragna-base

My idea would be to have a packaging directory in the project that holds all the configuration for ragna-base. The files in the root of the repository should correspond to ragna.

add custom commands/workflows to install ragna/ragna-base

The install should be fairly straight-forward. For a local development environemnt we run pip install -e ./packaging && pip install . in the project root. In CI, we can drop the -e.

test both

Could you elaborate what you want to test? Do you mean our test suite or something else?

If you mean the test suite, why do you want to test both? We should just require ragna to be installed and test ragna-base with all optional dependencies installed.

But at this point I want to give this approach a little try first and keep monorepo as a final resort as I am positive that it'd solve the issue.

Sure. If we have a clean way to avoid all of the above, I prefer that as well.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 9, 2024

Could you elaborate what you want to test? Do you mean our test suite or something else?

Yes, I was thinking that the test suite might break in case of ragna-base when some method dependent to optional dependency would be called. But since it is being suggested to run it with all optional dependencies installed for ragna-base as well, It'd not fail.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 9, 2024

But since it is being suggested to run it with all optional dependencies installed for ragna-base as well, It'd not fail.

Still, I'm not sure if this'd raise and import issue for optional dependency methods (if it'd we would need to add informative warnings and error and also test ragna-base without optional dependencies to prevent breaking unwanted methods & keep API stable.)

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 11, 2024

From these couple of days I've been trying to configure a concurrent setup.py that co-exist with pyproject.toml with almost same configurations. At the moment, the idea behind this is to have isolated builds with pyproject.toml & setup.py for ragna & ragna-base respectively, where setup.py also seems to be building the wheels for ragna-base for me locally with name ragna_base-0.0.0-py3-none-any.whl & desired base dependencies only, in the absence of pyproject.toml.
I am currently figuring out to have isolated builds in the presence of pyproject.toml, in case I do not manage to figure that out I believe it could be handled in the build workflow then.
I'd be really happy to have any thoughts on it. Thanks.

@pmeier
Copy link
Member

pmeier commented May 13, 2024

Depending on a setup.py is a non-starter. Right now we are using setuptools, but I don't want to lock us into that. If we depend on setup.py we cannot ever us a different build backend. Also note this footnote from the setuptools documentation

New projects are advised to avoid setup.py configurations (beyond the minimal stub) when custom scripting during the build is not necessary. Examples are kept in this document to help people interested in maintaining or contributing to existing packages that use setup.py. Note that you can still keep most of configuration declarative in setup.cfg or pyproject.toml and use setup.py only for the parts not supported in those files (e.g. C extensions)

Meaning, we could do it, but even the official maintainers are advising against it. The exception that they make is for building C extensions, but we are not doing that.

For this PR this means: if we cannot do it with just a pyproject.toml, we need to go with the approach I described in #405 (comment) or something along the lines of it. I'll ask around if someone has a different idea here, but I guess this is the best we can do right now.

@arjxn-py
Copy link
Contributor Author

Oh thanks Philip for making it clear in the start. Fortunately I got one more solution while scrolling web today (thanks to this comment) which looks much promising to me as compared to setup.py solution & monorepo architecture where we'd need to maintain almost double the codebase compared to now.
Plus I think this approach is pretty scalable too (in case we'd want more ragna-x package in future), just one more pyproject.toml & @nox.session. I'm pushing related changes in sometime.
Happy to have thoughts on it!

@pmeier
Copy link
Member

pmeier commented May 13, 2024

monorepo architecture where we'd need to maintain almost double the codebase compared to now.

I don't understand that. Basically, the only duplication would be the metadata in the pyproject.toml.

Note that the new ragna package is only a meta package. Meaning, it doesn't contain any code of its own, but only has some dependencies. So we don't duplicate anything but the metadata.

…while symlinking with any of them dynamically
@arjxn-py
Copy link
Contributor Author

With the current changes I'm able to build Ragna with nox -s build i.e. :

Ragna==0.1.0.dev199+gf9b8c39.d20240513110506
├── aiofiles [required: Any, installed: 23.2.1]
├── chromadb [required: >=0.4.13, installed: 0.5.0]
├── emoji [required: Any, installed: 2.11.1]
├── fastapi [required: Any, installed: 0.111.0]
├── httpx [required: Any, installed: 0.27.0]
├── httpx-sse [required: Any, installed: 0.4.0]
├── ijson [required: Any, installed: 3.2.3]
├── lancedb [required: >=0.2, installed: 0.6.13]
├── packaging [required: Any, installed: 24.0]
├── panel [required: ==1.3.8, installed: 1.3.8]
├── pyarrow [required: Any, installed: 15.0.0]
├── pydantic [required: >=2, installed: 2.7.1]
├── pydantic_core [required: Any, installed: 2.18.2]
├── pydantic-settings [required: >=2, installed: 2.2.1]
├── PyJWT [required: Any, installed: 2.8.0]
├── PyMuPDF [required: >=1.23.6, installed: 1.24.3]
├── python-docx [required: Any, installed: 1.1.2]
├── python-multipart [required: Any, installed: 0.0.9]
├── python-pptx [required: Any, installed: 0.6.23]
├── questionary [required: Any, installed: 2.0.1]
├── redis [required: Any, installed: 5.0.4]
├── rich [required: Any, installed: 13.7.1]
├── SQLAlchemy [required: >=2, installed: 2.0.30]
├── starlette [required: Any, installed: 0.37.2]
├── tiktoken [required: Any, installed: 0.6.0]
├── tomlkit [required: Any, installed: 0.12.5]
├── typer [required: Any, installed: 0.12.3]
└── uvicorn [required: Any, installed: 0.29.0]

And ragna-base with nox -s build-base i.e. :

ragna-base==0.1.0.dev199+gf9b8c39.d20240513105812
├── aiofiles [required: Any, installed: 23.2.1]
├── emoji [required: Any, installed: 2.11.1]
├── fastapi [required: Any, installed: 0.111.0]
├── httpx [required: Any, installed: 0.27.0]
├── packaging [required: Any, installed: 24.0]
├── panel [required: ==1.3.8, installed: 1.3.8]
├── pydantic [required: >=2, installed: 2.7.1]
├── pydantic_core [required: Any, installed: 2.18.2]
├── pydantic-settings [required: >=2, installed: 2.2.1]
├── PyJWT [required: Any, installed: 2.8.0]
├── python-multipart [required: Any, installed: 0.0.9]
├── questionary [required: Any, installed: 2.0.1]
├── redis [required: Any, installed: 5.0.4]
├── rich [required: Any, installed: 13.7.1]
├── SQLAlchemy [required: >=2, installed: 2.0.30]
├── starlette [required: Any, installed: 0.37.2]
├── tomlkit [required: Any, installed: 0.12.5]
├── typer [required: Any, installed: 0.12.3]
└── uvicorn [required: Any, installed: 0.29.0]

@arjxn-py
Copy link
Contributor Author

We can also get rid of requirements.txt & requirements-base.txt as I can specify the dependencies in pyprojects or noxfile directly.

@arjxn-py
Copy link
Contributor Author

I don't understand that. Basically, the only duplication would be the metadata in the pyproject.toml.

Note that the new ragna package is only a meta package. Meaning, it doesn't contain any code of its own, but only has some dependencies. So we don't duplicate anything but the metadata.

I see. Sorry, with the comment above I misunderstood that we'd had to have ragna/ inside packaging/ which would have acted as a separate src for ragna-base

@agriyakhetarpal
Copy link
Member

Hi, @arjxn-py. I had another idea about how to go about doing this, so I am posting this comment as requested by @pmeier. I would say that having the changes in noxfile.py are great, but it also adds an extra dependency on nox switching between the pyproject.toml files (might not be a problem though), plus, you can quite easily construct your own minimal CLI-based script to do this rather than relying on nox. Using nox will create redundant virtual environments with the invocation of each session unless you specify the venv_backend to be None.

Even though the setuptools documentation might not recommend setup.py for pure Python projects, having a minimal, shim-like setup.py file with most of the metadata still present inside pyproject.toml, is a reasonable compromise. This way, I can imagine we can drop adding two separate pyproject.toml files and function with one setup.py + one pyproject.toml.

The dynamic dependencies currently noted in pyproject.toml (requirements.txt) can be read and delimited to be used as metadata in setup.py itself, i.e., what I propose is of the following form:

import os
from setuptools import setup

with open('requirements.txt') as f:
    ragna_dependencies = f.read().splitlines()

with open('requirements-base.txt') as f:
    base_dependencies = f.read().splitlines()

name = 'ragna-base' if os.environ.get("BUILD_RAGNA_BASE") else 'ragna'
dependencies = base_dependencies if os.environ.get("BUILD_RAGNA_BASE") else ragna_dependencies

setup(
    # ...
    name=name,
    version='',
    packages=['ragna'],  # or find_packages() with include= and exclude= attributes
    install_requires=dependencies,
    # and so on
)

while the rest of the project metadata stays in pyproject.toml, because builders will construct and collate PEP 621 metadata from both of these files (and setup.cfg as well, but that's not being added here).

This is where you can switch between particular requirements at the time of running the build frontend (python -m build) to build the source distribution and the wheel with this environment variable (I don't recommend using --config-settings because setuptools does not seem to handle them very well in comparison to other build backends). This approach might require some changes to test_importable.py and the package structure with the use of setuptools.find_packages(), so that both ragna and ragna-base do not break on import ragna respectively (I am assuming as someone who hasn't used ragna before that both ragna and ragna-base will provide their public API under ragna and a separate ragna_base namespace is not being proposed). It is to be noted that this approach does restrict the build backend to setuptools and might not be useful if ragna were to switch to something like hatchling sometime later.


P.S. Sorry, I meant to post this yesterday – I didn't hit the "Comment" button and this comment stayed as a draft, oops! Now that I think of it, to avoid using setup.py you can also use the toml package and "construct" your pyproject.toml file through it with a helper script (similar to the noxfile.py you added, but with PEP 723 style metadata or something), i.e., dropping a particular TOML table or adding items to it at the time of building the base package. I would recommend modifying the importable GitHub Actions job in order to verify that it passes for both ragna and ragna-base, in any case (you can either use a matrix or split into two jobs).

P.P.S. Another option is to look at how fastapi and typer projects do it with their PDM backend – however, their implementation has not been published as a separate project and remains restricted to internal use cases, and adapting to that could take a bit of extra effort.

@arjxn-py
Copy link
Contributor Author

Hey @agriyakhetarpal, thanks for such a descriptive comment, much helpful. I'd be happy to integrate the suggested changes and would potentially get rid of nox.

P.P.S. Another option is to look at how fastapi and typer projects do it with their PDM backend – however, their implementation has not been published as a separate project and remains restricted to internal use cases, and adapting to that could take a bit of extra effort.

Just to let you know, I've tried to figure out what they've been doing and not been able to get more information about the significant of environment variable they're using and how they're handling it.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 22, 2024

For current changes Build for Ragna package & Build for ragna-base package both builds for ragna in spite of me trying to overwrite the name and dependencies in shim setup.py

In next commit I'll compare it to the approach using nox

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 30, 2024

Hi @pmeier, this past week I had been doing some hit & trials without noxfile so that I can prevent dependence on it.

And this current solution appear to work seamlessly which works without the need to have two separate pyprojects, noxfile or setup.py. Plus it also fulfills the things being discussed in #397 including #397(comment).

The couple of processes that are failing should be easily fixable but before that I'm marking this ready for review for an initial review. Thanks.

cc: Also thanks @agriyakhetarpal because of which I started to look for solution other than the nox one.

For reference, here is the build for ragna & ragna-base

@arjxn-py arjxn-py marked this pull request as ready for review May 30, 2024 21:21
arjxn-py and others added 2 commits June 2, 2024 18:29
Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
@arjxn-py arjxn-py changed the title [WIP] Install optional dependencies by default PoC for ragna-base package with just base dependencies Jun 2, 2024
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arjxn-py for the current version. I left a few comments, but nothing major.

However, one thing is not how we discussed it. The difference between ragna and ragna-base in this PR is that ragna has a few more dependencies than ragna-base. But the ragna package still distributes our code itself rather than depending on ragna-base.

ragna should be a meta package that does depend on ragna-base (same version number) and all optional dependencies, but should not distribute any code itself.

scripts/update_optional_dependencies.py Outdated Show resolved Hide resolved
scripts/update_optional_dependencies.py Outdated Show resolved Hide resolved
scripts/build_helper.py Outdated Show resolved Hide resolved
scripts/build_helper.py Outdated Show resolved Hide resolved
scripts/build_helper.py Outdated Show resolved Hide resolved
scripts/build_helper.py Outdated Show resolved Hide resolved
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
@arjxn-py
Copy link
Contributor Author

arjxn-py commented Jun 6, 2024

ragna should be a meta package that does depend on ragna-base (same version number) and all optional dependencies, but should not distribute any code itself.

Yes, current ragna is not a metapackage, but I feel that the current method is cleaner and fulfills the requirements of a metapackage.

Note that a metapackage does not have any API of its own; it merely provides dependencies for other packages. Therefore, if we want to make ragna a metapackage, it should install ragna-base and optional dependencies as needed. Which we may try once the ragna-base is published on pypi maybe (i.e. in requirements.txt we should be able to replace all base dependencies with just ragna-base)

@arjxn-py arjxn-py requested a review from pmeier June 7, 2024 13:06
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, current ragna is not a metapackage, but I feel that the current method is cleaner and fulfills the requirements of a metapackage.

It does not. The problem is that you now have two separate distributions (ragna and ragna-base) that install the ragna package into your environment. Worse, they don't even know of each other.

So what happens if someone specifies something like pip install ragna==0.3.0 ragna-base==0.2.0? No one would do this in their direct requirements, but maybe ragna and ragna-base are being pulled in as dependencies?

To avoid this, we need to have the ragna distribution a metapackage that installs ragna-base. That way, the install command will net you dependency conflict. And there is no ambiguity which distribution actually supplies the ragna package.

Note that a metapackage does not have any API of its own; it merely provides dependencies for other packages. Therefore, if we want to make ragna a metapackage, it should install ragna-base and optional dependencies as needed. Which we may try once the ragna-base is published on pypi maybe (i.e. in requirements.txt we should be able to replace all base dependencies with just ragna-base)

Yes, this is exactly what I want to achieve. Why do you want to wait for ragna-base being published though?

@arjxn-py
Copy link
Contributor Author

arjxn-py commented Jun 13, 2024

Thanks for your review & apologies for bit of a delay in response.

So what happens if someone specifies something like pip install ragna==0.3.0 ragna-base==0.2.0? No one would do this in their direct requirements, but maybe ragna and ragna-base are being pulled in as dependencies?

I tried doing that locally with wheels of ragna & ragna-base which had different local version identifier i.e.

  • ragna-0.1.0.dev227+g9b0f1c2.d20240612161602-py3-none-any.whl
  • ragna_base-0.1.0.dev227+g9b0f1c2.d20240612161323-py3-none-any.whl

I observed that these two were getting installed subsequently without error & while I try to do ragna --version in the presence of both, it returns me the version of ragna :

ragna --version
ragna 0.1.0.dev227+g9b0f1c2.d20240612161602 from C:\Users\Arjun\miniconda3\Lib\site-packages\ragna

I also then uninstalled ragna which must've removed -

  • c:\users\arjun\miniconda3\lib\site-packages\ragna-0.1.0.dev227+g9b0f1c2.d20240612161602.dist-info\*
  • c:\users\arjun\miniconda3\lib\site-packages\ragna\*
  • c:\users\arjun\miniconda3\scripts\ragna.exe

I then needed to re-install ragna-base & did ragna --version again which returned :

ragna --version
ragna 0.1.0.dev227+g9b0f1c2.d20240612161323 from C:\Users\Arjun\miniconda3\Lib\site-packages\ragna

As far as I observed, priority is ragna > ragna-base in case of both are installed at the same time.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented Jun 13, 2024

To avoid this, we need to have the ragna distribution a metapackage that installs ragna-base. That way, the install command will net you dependency conflict. And there is no ambiguity which distribution actually supplies the ragna package.

Assuming that you mean this (So sorry if I assumed wrong) :

Dependencies for ragna Base deps replaced with ragna-base
aiofiles
emoji
fastapi
httpx
importlib_metadata>=4.6; python_version<'3.10'
packaging
panel==1.4.2
pydantic>=2
pydantic-core
pydantic-settings>=2
PyJWT
python-multipart
redis
questionary
rich
sqlalchemy>=2
starlette
tomlkit
typer
uvicorn
chromadb>=0.4.13
httpx_sse
ijson
lancedb>=0.2
pyarrow
pymupdf>=1.23.6
python-docx
python-pptx
tiktoken
ragna-base
chromadb>=0.4.13
httpx_sse
ijson
lancedb>=0.2
pyarrow
pymupdf>=1.23.6
python-docx
python-pptx
tiktoken

I'd like to make a note that both of these serves the same purpose but there's a potential problem with if we were to use ragna-base from pypi as a dependency for ragna. They would not be in sync, because even if we updated the dependencies for ragna-base and then wanted to in a corresponding manner update the dependencies for ragna, we would need to put out a release for ragna-base on PyPI first and then publish ragna after that, which could cause developers troubles.

Yes, this is exactly what I want to achieve. Why do you want to wait for ragna-base being published though?

It's because package manager wouldn't be able to identify what ragna-base is unless we have it published this is a bit related to what I've mentioned above.

@arjxn-py
Copy link
Contributor Author

I am really sorry if i'm being a little persuasive, I just want to make sure that we explore optimized solutions plus all the edge cases that I could think of at the moment which may cause potential bottleneck situations in the future. However I'm trying to answer to the best of my current knowledge & capacity I believe that I may also not be fully correct so I'd be happy to move forward upon how you decide after a collective discussion.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented Jun 29, 2024

Just a gentle follow up, I've replaced the base dependencies with ragna-base for ragna & CI is failing as expected as pip cannot identify what ragna-base is unless it is published.
But yes the implementation can stick to this #397suggestion

Sorry, i didn't want to bother you with pinging and was expecting your response whenever you had chance.

@arjxn-py arjxn-py requested a review from pmeier June 29, 2024 23:18
@pmeier
Copy link
Member

pmeier commented Jul 1, 2024

Sorry, i didn't want to bother you with pinging and was expecting your response whenever you had chance.

Don't worry, you have every right to ping when I don't get back to you. I'm sorry for that. I should have a chance to look at this in the next two days.

@pmeier
Copy link
Member

pmeier commented Jul 4, 2024

@arjxn-py I'm not really comfortable with having both packages supply our source code. I played with the proper meta-package idea a little in #439. Could you please have a look?

@pmeier
Copy link
Member

pmeier commented Jul 22, 2024

Closing this in favor of #439 that solves a lot of the issues I mentioned here. @arjxn-py let's collaborate there.

@pmeier pmeier closed this Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants